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Abstract 


The  asymptotic  distribution  of  an  '1-estimator  is  studied  when  the  under¬ 
lying  distribution  is  discrete.  Asymototic  normality  is  shown  to  hold  auite 
generally  within  the  assumed  parametric  family.  When  the  specification  of  the 
model  is  inexact,  however,  it  is  demonstrated  that  an  M-estimator  with  a  non¬ 
smooth  score  function,  e.g.  a  Huber  estimator,  has  a  non-normal  limiting  dis¬ 
tribution  at  certain  distributions,  resulting  in  unstable  inference  in  the 
neighborhood  of  such  distributions.  Consequently,  smooth  score  functions  are 
prooosed  for  discrete  data. 

AMS  1970  Subject  Classification:  62E20,  62F10,  62G35. 
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1.  Introduction 


M-estimation,  originally  proposed  by  Huber  (1964)  to  estimate  a  location 
parameter  robustly,  has  since  been  applied  successfully  to  a  variety  of  esti¬ 
mation  problems  where  stability  of  the  estimates  is  a  concern.  There  is,  for 
instance,  a  substantial  body  of  literature  on  M-estimation  for  regression 
models;  see  Krasker  and  Uelsch  (19S2)  for  a  recent  review.  For  further  re¬ 
ferences  on  M-estimation,  see  Huber  (1931). 

Much  of  the  popularity  of  M-estimators  can  be  attributed  to  their  flexi¬ 
bility.  Desired  properties  of  an  M-estimator,  such  as  relative  insensitivity 
to  or  rejection  of  extremely  outlying  data  points,  can  be  specified  in  a 
direct  way  since  the  influence  function  of  an  M-estimator  is  proportional  to 
its  score  function;  see  Hamoel  (1974)  or  Huber  (1981)  for  details. 

Surprisingly,  M-estimation  for  discrete  data  seems  to  have  received  little 
attention.  Discrete  data  are  no  less  prone  than  continuous  measurements  to 
outliers  or  partial  deviations  from  an  otherwise  reasonable  model;  see,  for 
instance,  data  from  mutation  research  presented  in  Simpson  (1985).  This 
paper  investigates  some  aspects  of  M-estimation  for  discrete  data. 

A  useful  optimality  theory  has  been  developed  by  Hampel  (1968,  1974)  for 
robust  M-estimation  of  a  univariate  parameter.  His  general  prescription  fa¬ 
cilitates  the  construction  of  robust  M-estimators  with  nearly  optimum  effi¬ 
ciency  at  a  specified  model.  Prooosals  for  robust  estimation  of  the  binomial 
and  Poisson  oarameters,  for  instance,  can  be  found  in  Hampel  (1963).  Hampel's 
univariate  theory  is  briefly  reviewed  in  Section  2.  Extensions  of  this  opti¬ 
mality  theory  to  certain  multivariate  models  are  discussed  in  Krasker  (1980), 
Krasker  and  Welsch  (1982),  Ruppert  (1985),  and  Stefanski,  Carroll,  and  Ruppert 
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The  score  function  for  Hampel's  optimal  M-estimator  is  not  smooth,  that 
is,  it  is  not  everywhere  differentiable.  This  can  lead  to  complications  in 
the  asymptotic  theory  when  the  data  are  discrete.  For  instance,  Huber  (1981, 
n.  51)  considers  the  case  where  the  underlying  distribution  is  a  mixture  of 
a  smooth  distribution  and  a  point  mass.  He  observes  that  if  the  ooint  mass 
is  at  a  discontinuity  of  the  derivative  of  the  score  function,  then  an  M- 
estimate  for  location  has  a  non-normal  limiting  distribution.  Along  the  same 
lines,  Hampel  (1968,  p.  97)  notes  that  the  ODtimal  M-estimate  for  the  Poisson 
oarameter  is  asymptotically  normal  at  the  Poisson  distribution,  provided  the 
truncation  points  of  the  score  function  are  not  integers.  He  conjectures 
that  "under  any  Poisson  distribution,  it  is  asymptotically  normal  (with  the 
usual  variance);  however,  this  remains  to  be  seen." 

This  paper  provides  extensions  to  the  asymptotic  distribution  theory  of 
M-estimators  especially  relevant  to  discrete  data,  although  Theorem  1  is 
somewhat  broader  in  scope'.  The  main  results  are  given  in  Section  3.  Among 
the  applications  of  the  theory  are  a  more  complete  account  of  the  asymptotics 
of  the  Huber  M-estimate  for  location  and  a  proof  of  Hampel's  conjecture. 

Aside  from  providing  a  more  complete  asymptotic  theory  for  M-estimation,  the 
results  have  implications  for'  choosing  a  score  function  when  the  data  are  dis¬ 
crete.  These  are  discussed  in  the  final  sections.  In  particular,  smooth 
score  functions  are  proposed. 

2.  Parametric  M-estimation:  Definitions,  optimality  and  examples 

Suppose  X-pX^,...  are  independent  observations,  each  thought  to  have  dis¬ 
tribution  function  (d.f.)  F  ,  where  9  belongs  to  a  parameter  set  0;  here  9  is 


a  subset  of  R  ,  d  1.  Define 


(2.1)  =  /.;(•, t)dF, 

where  F  is  a  d.f.  on  ,  u(*,*)  is  a  measurable  real-valued  function  on 
<  0,  and  t  r.  Z.  Then  Tn  is  an  M-estimator  for  3,  based  on  a  sample  of  size 
n,  if  it  solves  an  equation  of  the  form 

(2.2)  M(Tn ; T>,Fn)  =  0, 

where  F^  is  the  emoirical  d.f..  The  standard  requirement 

(2.3)  M  ( 9 ;  y ,  F . )  =  0 ,  9  £  S , 

and  additional  regularity  conditions  ensure  that  Tn  consistently  estimates 
3  when  the  model  is  correct. 

Suppose  now  that  3c  r\  The  influence  function  at  F  of  an  M-estimator 

V 

for  9  has  the  form 


^  ( x ,  9 : 


>(x,d) 


-/(^(•.9)idF 


provided  this  exists.  Assume  F^  has  a  density  f ,  with  respect  to  a  suitable 
measure,  and  assume  the  parametrization  is  smooth.  Letting  L(x,9)  =  ^-log  f^(x) , 
the  ootimal  score  according  to  Hampel's  criterion  has  the  form 


(2.4)  *c(e)U(x, 9)  -  st(9)), 

where 


(u) 


u  ,  1  u  j  ■:  c 

c  sign  (u) ,  j  u  f  N c. 
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and  oc  is  defined  implicitly  by  (2.3).  This  estimator  cannot  be  dominated 
by  any  M-estimator  simultaneously  with  respect  to  the  asymptotic  variance  and 
the  bound  on  the  influence  function  at  F  .  This  is  assuming,  of  course,  that 
the  estimator  is  asymototical 1 v  normal  at  F... 

The  truncation  point  c ( 6 )  determines  the  bounds  on  T3( • ,9 )  and  hence  the 
robustness  of  the  estimator  to  outlying  data  points.  Observe  that  the  maximum 
likelihood  estimator  has  the  form  (2.4)  with  c(8)  =  »  and  a(9)=0. 

Two  examoles  given  in  Hampel  (1963)  will  be  of  special  interest  here. 

Examole  1.  If  F  is  the  normal  d.f.  with  mean  0  and  unit  variance  then 
-  9 

£(x, 9)  =  x-e.  By  symmetry  a(9)s0,  and  constant  variance  suggests  setting 
c(9)=c.  The  resulting  estimator,  with  score  -p  (x-9),  is  the  Huber  (1964) 
M-estimator  for  location. 

Examole  2.  If  F„  is  the  Poisson  d.f.,  with  density  f„ (x)  =  e~^9X/x!  on 

1  9  0 

x = 0,1 ,2, . . . ,  then  i(x,d)  =  x8~^  -  1 .  Hampel  (1968,  p.  96)  suggests  taking 

_i/  _i if 

c(9)  =  c3  2  on  the  grounds  that  £(x,s)  has  standard  deviation  6  /2 .  For 

i/  i/ 

this  choice  (2.4)  is  equivalent  to  <j>  (x9  -9-a(0)).  The  version 

(2.5)  *c(x9'1/2-S(0)), 

where  3(9)  = 9  2  + ^(3)  is  defined  by  (2.3),  is  slightly  more  convenient. 

3 .  Extended  asymptotic  distribution  theory 

Conditions  for  consistency  of  an  M-estimator  can  be  found  in  Huber  (1964, 
1967,  1981).  Since  the  smoothness  olavs  no  role  in  the  consistency  proofs, 
consistency  will  usually  be  assumed  here. 

Huber  (1931,  theorems  3.2.4  and  6.3.1)  shows  under  quite  general  condi¬ 
tions  that  if  ^  -  -  =  T(G)  in  probability  as  n »  then 


(3.1) 


-n/2M(T  :*,G)  =  n"  /:  y  ^(x.,-3)  +  o  (1), 
n  •jS1  1  0 

where  M  is  given  by  (2.1).  In  particular,  -<p  need  not  be  differentiable;  nono¬ 
tonicity  or  Lipschitz  corcinuity  conditions  are  sufficient.  That  T^  is  asymp¬ 
totically  normal  follows  immediately  from  (3.1)  provided  M(t;]>,G)  has  a  non- 
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zero  derivative  at  -3  and  0  <  f-p  (*,3)dG<*>;  see  Corollary  6.3.2  of  Huber  (1981) 
For  stronger  almost  sure  representations  for  Tn  under  stronger  conditions,  see 
Carroll  (1978a,  1978b). 

To  avoid  Lipschitz  conditions  for  score  functions  like  (2.5)  that  have  im¬ 
plicitly  defined  centering  parameters,  the  following  lemma  is  useful.  The 
proof  is  contained  in  the  proof  of  Theorem  2.2  of  Boos  and  Serfling  (1980). 
Denote  by  |j*!|  the  total  variation  norm,  given  by 

k 

jjhjjv  =  lim  sup  l  jh(x.)  -  h(x._j)| , 

where  the  supremum  is  over  partitions  a  =  x  <x,  <  ...  <x,  =b  of  [a,b],  and  the 

0  I  K 

1  imit  is  as  a  b  -*■  ». 

Lemma  1 .  Let  X^X^,...  be  independent,  each  with  d.f.  G,  and  let  9  =  T(G). 
Suppose  f(x,t)  is  continuous  in  x  for  t  £  ;cR^  and 

1  im!|p(  • ,  t)  -  ;(•,-?)  !|  =  0. 

t-+n  V 

If  Tn  7  in  probability  as  n-«,  then  (3.1)  holds. 

Remark .  The  score  functions  of  Examples  1  and  2  are  continuous  in  total  vari¬ 
ation.  For  the  former  see  Boos  and  Serfling  (1980).  For  the  latter,  see 
Simpson  (1985). 


When  the  underlying  distribution  is  discrete,  the  set  of  points  where  m 


fails  to  have  a  derivative  has  positive  probability  for  certain  parameter 
values.  In  light  of  (3.1),  it  is  natural  to  ask  whether  M  can  have  a  deri¬ 
vative  at  such  parameter  values,  i.e.,  whether  Tn  can  be  asymptotically  nor¬ 
mal  . 

The  following  theorem  addresses  this  question.  For  ueGcR^,  f.  is  as¬ 
sumed  to  have  a  density  fq  =  f(*,9)  with  respect  to  a  a-finite  measure  u,  and 
v...  =  is  measurable  for  each  9.  Let  |j*;|  denote  any  norm  on  equivalent 

to  the  Euclidean  norm.  Some  regularity  conditions  are  needed: 


(Al) 


There  are  measurable  functions  =  u?(*,t)  and  g^  =  d(*,t)  for  which 
du,  /'yt'gtdu  and  fco^g^du  are  finite  and,  for  some  5>0, 


(i)  !fs  -  ft!  '  ||  s  -  1 1| gt,  and 
(ii)  <  u)t 

almost  everywhere  [u]  (a.e.)  when  |js-t]|  <  5; 

(A2)  There  is  a  measurable  Revalued  function  ft  =  f(-,t)  such  that 

'fs-ft-  =  o  ( i|  s  - 1  j| )  a.e.; 

(A3)  y  a.e.  as  s  +  t. 

Theorem  1  .  If  for  each  t~G  (Al )  -  (A3)  hold  and 

(3.?)  M(t;F  )  =  0 

then 

(3.3)  DsM  (s;Ft)j$=t  =  -fy^du, 

where  D  denote ^  vector  differentiation,  and  where  the  dependence  of  M  on  . 
has  been  suporessed. 
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Proof.  For  s,t  e 9 

0  =  M(s:Ft)  -  M(t;Ft)  +  M(s;Fs)  -  M(s:Ft) 

=  M(s;Ft)  -  M(t;Ft) 

(3.4)  +  M(t;Fs)  -  M(t;Ft)  +  Rt(s), 

where  Rt(s)  =  /(^s  -  =|>t) (fg  -  ft)dy 

and  (3.2)  was  used.  The  integrand  of  R^(s)  is  dominated  in  absolute  value 
by  2 '[  s  -  t  H  on  j|s  -  t  [  j  <3  because  of  (Al).  Hence,  by  (A3)  and  Dominated 
Convergence, 

(3.5)  Rt(s)  =  II  s  "  t ! | ) . 

Similarly,  (A2)  and  Dominated  Convergence  imply 

;H(t;F5)  -  N(t;Ft)  -  (s  -  t)T/'i.(.ftdu ! 

5  I'h !  if,-ft-(s-t)Tft!du 
=  o ( ' |  s  - 1 i | )  as  s  +  t, 

s’nce  the  integrand  is  dominated  by  2 [ j s  -  1 1 1  °n  |js-tj|<5.  From 

(3.4)  to  (3.6)  conclude 

j T 1  ( s ; F t )  -  M(t;Ft)  +  (s  -  t)T/ytftdu |  =  o(j|s-tj|). 

Hence  DsM(s;Ft)  exists  at  t  and  is  given  by  (3.3). 

Remarks.  1.  Note  that  need  not  be  differentiable. 

2.  When  =  f^/f^,  (3.3)  generalizes  the  usual  information  identity. 


3.  Huber  (1981,  d.  51)  observes  a  special  case,  namely  (3.3)  holds  when  p 

is  Lebesgue  measure,  ;>(x,t)  = :p(x  -  t) ,  where  p(’)  is  skew-symmetric  about  zero, 
and  f (x,t)  =  f (x  -  t) ,  where  f(-)  is  differentiable  and  symmetric  about  zero. 

4.  Equation  (3.3),  when  it  holds,  also  guarantees  that  the  influence  function 
at  the  model ,  given  by 


(DSM  (s  ;Ft)  j  s=t}“V(x,t) 

is  defined  for  each  t  £  Q,  provided  that  f’Pt^ du^O. 

_  +■  y 

Example  2  (continued)  Suooose  f(x,t)  =  e  t  /x!  on  {0,1,2,...},  t>0.  Recall 

_i/ 

that  the  ootimal  M-estimator  has  the  score  ’Jj(x,t)  -  (xt  /2-3).  This  esti¬ 
mator  is  known  to  be  asymptotically  normal  at  the  Poisson  distribution  when  t 
is  in  one  of  the  open  intervals  where  neither  of  the  truncation  points 
t'^Ulc)  is  an  integer;  see  Hampel  (1968,  p.  97). 

To  show  that  it  is  asymptotically  normal  at  every  Poisson  distribution, 
as  conjectured  by  Hampel,  first  use  Theorem  1  with 
g(x,t)  =  e^f  (x  -  1  ,t  +  5)  +  5  ^(e5-l  -6)f(x,t),  o)(x,t)  =  c  and 
f (x,t)  =  f(x-  l,t)  -  f(x,t).  Note  that  c>l  is  sufficient  for  3  to  be  con¬ 
tinuous,  and  hence  for  (A3);  see  Simpson  (1935). 

2  2 

Since  Lemma  1  applies  and  0<  f-p  f  ^dy  <  c  for  c>l,  it  follows  that  the 
estimator  is  asymptotically  normal  at  every  Poisson  distribution  if  it  is 
consistent.  For  consistency  see  Hampel  (1968,  p.  96)  and  Theorem  2  of  Huber 
(1967). 

In  Theorem  1,  (3.2)  allows  smoothness  of  the  parametrization  to  be  sub¬ 
stituted  for  smoothness  of  .p  within  the  assumed  parametric  model,  so  that 
the  estimator  is  asymptotically  normal  under  further  conditions.  If  the 
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specification  of  the  model  is  inexact  (as  is  often  suspected),  no  result  like 
(3.3)  is  available.  In  certain  cases,  it  is  still  possible  to  obtain  the 
limiting  distribution  of  Tn  from  (3.1). 

Assume  for  simplicity  that  9  is  an  open  subset  of  the  real  line.  The 
score  functionsused  for  robust  estimation  are  generally  at  least  niece-wise 
differentiable.  The  one-sided  derivatives  of  M(t;G)  will  then  exist,  in 
general,  even  when  M  fails  to  be  differentiable.  Write 

m(t  ;G)  =  (t;G) 

when  the  derivative  exists.  By  a  well-known  result  from  calculus,  if 
m(A-;G)  and  m(A+;G)  exist,  they  are  equal  to  the  corresponding  one-sided 
derivatives  of  M(t;G)  at  's  see,  e.g.,  Franklin  (1940,  p.  113). 

Theorem  2.  Suppose  for  some  9  interior  to  T  that  M(6;G)  =0,  and  let  Tn  be  a 
zero  of  M(t:F  ),  n=l,2,...,  where  Fn  is  the  empirical  d.f.  Assume  the  fol¬ 
lowing: 

(31)  M(A-;G)  and  m(T+;G)  exist  finitely  and  are  non-zero  and  of 

the  same  sign; 

(B2)  0<o<':o,  where  a  =  ftyadG:t 

(B3)  -  A  in  probability  as  n-*«,  and  (3.1)  holds. 


Then 

(3.7) 

1  im  sup  ! pr(n  2 (T  -  9)  < 

fl-+OQ  -00<2<-*» 

z)  -  H(z) j  =  0, 

ft>(  |m(A+;G)  \z/a) , 

z  >  0 

where 

H(z )  = 

-i( j  m ( 9 - ; G ) |z/a) , 

z<0, 

and  :  is  the  standard  normal  d.f. 
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Remarks .  1.  Huber  (1964,  p.  73)  alludes  to  a  similar  result  for  a  location 

estimator. 

2.  The  requirement  that  m(e+  ;G)  have  the  same  sign  is  actually  implied  by 
the  remaining  conditions.  If  the  one-sided  derivatives  were  to  have  opposite 
signs,  M(t;G)  would  not  change  signs  in  a  neighborhood  of  9  and  (3.1)  would 
not  hold. 

The  proof  of  Theorem  2  is  deferred  to  the  Appendix. 

Example  1  (continued)  Recall  that  the  Huber  M-estimator  for  location  has  the 
score  -jj(x,t)  =  41  (x-t).  For  any  d.f.  G,  M(-»;G)  =  c  =  -M(*>;G),  and  M(t;G)  is 
continuous  in  t  so  it  has  a  zero  9.  Assume  6  =  0.  This  is  unique  if 
G(c-)  >  G(-c+) ,  in  which  case  Tn  -*■  0  in  probability  by  Proposition  2.2.1  of 
Huber  (1931).  Since  ^  is  continuous  in  total  variation,  (3.1)  holds  by 
Lemma  1.  Letting  •  ( x , t )  =  d/dt  -^(x  -  t)  =  -i^(x-t)  if  it  exists,  observe 
that  -i(x,t-)  =  I(-c<x-t<c)  and  ~y(x,t+)  =  I(-c<x-t<c),  where  I(*)  de¬ 
notes  the  indicator  function.  Bounded  convergence  yields  -m(0-;G)  =  G(c-)  -  G(-c-) 

i/ 

and  -m(0+;G)  =  G(c+)  -  G(-c+).  Hence,  by  Theorem  2,  n  2Tn  is  asymptotically 
normal  if  G(c+)  -  G(c-)  =  G(-c+)  -  G(-c-);  otherwise,  it  has  a  limiting  dis¬ 
tribution  consisting  of  the  left  and  right  halves  of  two  normal  distributions 
with  different  variances  (cf.  Huber  (1981,  p.  51)). 

4.  A  counterexample 

It  is  instructive  to  examine  the  extent  of  the  non-normality  that  occurs 
in  a  specific  example.  Consider  again  the  optimal  M-estimator  for  the  Poisson 
parameter.  The  score  function  is 

x<  i(t) 

3,  d(t)  <  x  <  h(t) 
h(t)  x, 


I  "C’i 

T'(x,t)  =  ^(xt‘^-3)  =  l  xt"  ,2  - 

c, 


'  U  fr  W_w  w-W  w  TT  *  V" 


V 


:  ^  v  ^  y  -  .  ■  !■  ■  y  -  .  ■  .■  ■  v  ■  v  -  ■/ 


n 


where  L(t)  =  t  ^  (2(t)  -  c)  and  h(t)  =  t  ^  (3(t)  +  c) . 

Let  G  be  the  actual  d.f.  and  let  ;  =  T(G).  The  simplest  situation  is  when 
9  is  small.  Assume  henceforth  that  <!(:■)•  0<h(-)  =  1.  Calculation  yields 
3(t)  =  c(ct-l)  for  c(t)  <  0,  0  <  h(t)  _  1 ,  and  S(t)  =  c{et(l  +  t)"^  -  1 1  +  t  ^  (1  +  t) 
for  L(t)  <  0,  1  <  h(t)  <  2.  Since  3  is  continuous,  equating  the  two  expressions 
at  ■  gives 

(4.1)  T^e9  =  c"1. 

The  one-sided  derivatives  of  3  at  9  are  3' (3-)  =  ce  and  3' (6+)  =  ^ce^(l  +  9)” 
where  (4.1)  was  used.  Mote  that  3  is  strictly  increasing  at  3.  Since 
O^(c-)  =  1  and  -^(c+)  =0, 

rce '' ,  x  =  0 

(4.2)  -y(x,3-)  = 

l0,  x  =  1,2,... 


and 


4(x,9+) 


^-ce9  (1  +  9)"2 ,  x  =  0 

'ice9  {9_1  +  (1  +9)"1},  x=  1 
2 

,  x  =  2,3,.  .  . 


Suppose  G  is  a  mixture  of  a  Poisson  distribution  F^  and  a  point  mass  at  an 
integer  z,  i .e. ,  G  =  (1  -  e)Ft  +  e6^.  Assume  z  >  h(t)  so  ^(z,3+)  =  0.  From  (4.2) 
and  (4.3) 


(4.4) 

where  m(e-;G) 


m(9+;G)  _  T_,t  1+t, 
m ( 9 - ; G )  "  2V9  l+9; ’ 

=  -ce  _t(l  -  e) .  The  ratio  (4.4)  is  unity  only  when  t  =  9,  which 


corresponds  to  c  =  0  or  z  =  t.  By  Theorem  2,  the  limiting  distribution  of 

y, 

n  2(T  -3)  consists  of  the  right  and  left  halves  of  two  normal  distributions. 


T 


v v  r.  ■  ^  v  it. ».■ 1  ip  j  ■,■ 
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The  ratio  of  their  standard  deviations  is  (4.4). 

Solving  0  =  M(6:G)  =  c{l  -  (1  -  e)eJ_t}  yields  t  =  3  +  log(l  -  -;) .  Table  1 
shows  the  values  of  t  and  (4.4)  for  several  values  of  s  when  3  -  0.25  and 

-  ^ /  —A 

c  =  6~ /2e~  =  1 . 5576  ...  (see  (4.1)).  In  addition,  the  effect  on  a  nominal  .05 
tail  probability  is  shown. 

For  very  small  values  of  e  the  effect  is  minimal,  which  accords  with  the 
robustness  of  Tn  in  the  sense  of  weak*  continuity  (see  Hamoel  (1971)),  since 
it  is  asymptotically  normal  at  the  model.  As  e  increases,  however,  the  ef¬ 
fect  becomes  more  serious,  and  inference  based  on  T^  can  be  substantially 
biased. 

For  related  work  see  Stigler  (1973),  who  observes  that  a  bias  of  this 
type  can  arise  when  the  trimmed  mean  is  used  for  discrete  or  grouoed  data. 

Table  1  Effect  of  contaminating  mass  c  with  3  =  0.25  fixed 


£ 

t 

r  =  (4.4) 

*(-1.645r) 

0 

0.25 

1 

.05 

0.01 

0.24 

0.976 

.054 

0.05 

0.199 

0.377 

.074 

0.10 

0.145 

0. 748 

.109 

0.15 

0.087 

0.610 

.158 

0.20 

0.027 

0.465 

.222 

5.  Smooth  score  functions 

In  the  examole  of  the  preceding  section,  one  might  argue  that  the  para¬ 
meter  values  where  problems  arise  are  unlikely  to  occur  in  practice,  or  that 
c  can  be  changed  slightly.  It  is  not,  however,  the  non-normal  limiting  dis¬ 
tribution  of  T  at  certain  distributions  that  is  of  concern,  but  the  insta- 
n 

bility  of  inference  based  on  T  near  those  distributions.  This  Dhenomenon 

n 
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can  alternatively  be  interpreted  as  a  discontinuity  of  the  asymptotic  variance 

functional  V(T(G);G)  =  {m(T(G)  ;G)  }_1 /^^dG  -;m(T(G)  ;G)  }_1 ;  cf.  Huber  (1981,  p. 

51).  In  the  neighborhood  of  a  distribution  where  V  is  discontinuous,  estimates 

of  the  variance  of  T  may  be  unstable. 

n 

Instability  of  tyis  type  can  be  avoided  by  reauiring  the  M-estimator  score 
function  to  be  smooth,  for  example,  by  replacing  ■/>  (•)  in  (2.4)  with  a  smooth 
aoproximation.  A  natural  way  to  construct  such  a  function  is  by  rescaling  a 
smooth  distribution  function. 

Suooose  F  is  an  absolutely  continuous  d.f.  with  density  f  symmetric  about 
zero.  Then 

(5.1)  y(x)  =  2c{F(2HT[oy)  "  ?} 

is  monotone  increasing,  skew-symmetric  about  zero,  and  satisfies  y(o°)  =  c  and 
V ( 0 )  =  1 .  Observe  that  • pc  is  obtained  from  (5.1)  by  taking  F  to  be  the  uni¬ 
form  distribution  on  [-V2 , V2 1 .  This  can  be  approximated  arbitrarily  closely 
by  a  symmetric  beta  distribution  with  a  small  value  for  the  shape  parameter, 
i.e.,  f  (x)  «  { (  V2  +  x)  ( V2  -  x)  }a  on  [  -  V2 ,  V2  D  •  The  resulting  score  function  is 
complicated,  however,  and  its  second  derivative  has  jump  discontinuities.  A 
more  convenient  choice  is  the  logistic  distribution,  which  leads  to  the  smooth 
function 


L  (x)  =  c  tanh  (x/c) . 

This  has  appeared  previously.  L  (x  -  t)  is  the  maximum  likelihood  score  for 
the  location  of  a  logistic  distribution  with  scale  1.  Holland  and  Welsch 
(1977)  include  an  M-estimator  using  L  in  a  Monte  Carlo  study  of  robust  re¬ 
gression  estimates. 

For  the  important  special  case  of  estimating  a  Poisson  parameter  robustly. 
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a  smooth  version  of  the  ontimal  M-estimator  solves 


:5.2) 


n"1  y  L  (x.t"  'z  -  3(t))  =  0, 
i  =  l  C  1 


where  is  defined  in  the  usual  manner. 

Table  2  aives  asymptotic  variances  V,  and  bounds  on  influence  functions 

^or  the  estimator  defined  by  (5.2),  labeled  I_c ,  and  the  optimal  estimator, 

labeled  j  .  In  each  case  c  =  1.5.  The  calculations  are  at  the  Poisson  model, 
c 

i/ 

and  V-  and  are  stabilized  by  dividing  by  9  and  9  2  respectively. 

Note  that  V _, / -3  is  the  asymototic  relative  efficiency  of  the  maximum  like¬ 
lihood  estimator  (sample  mean)  with  respect  to  the  corresponding  M-estimator. 
The  asymptotic  variances  for  the  logistic  score  are  slightly  smaller  than 
those  for  the  "optimal"  score.  This  is  possible  because  the  bounds  on  the 
influence  function  of  l_c  are  slightly  higher  for  -j  .  In  terms  of  performance 
at  the  model,  there  appears  to  be  little  difference  between  l_c  and  p  . 


Table  2  Asymptotic  variances  and  influence  function  bounds  at  the  Poisson  model 


Mean 

9 

V9 

"c  y 

V9  * 

V,/9  ° 

i/ 

V9  2 

0.1 

1 .052 

3.16 

1.043 

3.27 

0.2 

1.107 

2.24 

1 .081 

2.53 

0.3 

1.138 

1.98 

1 .094 

2.29 

0.4 

1.114 

2.00 

1.095 

2.19 

0.5 

1.092 

1.98 

1.033 

2.14 

1.0 

1 .071 

1  .34 

1.059 

2.07 

2.0 

1 .057 

1.74 

1.045 

2.04 

5.0 

1 .043 

1.75 

1.033 

2.02 

10.0 

1.040 

1.74 

1  .035 

2.02 

100.0 

1.037 

1.73 

1.033 

2.01 

LJfc  ---v -  « 

6.  Further  remarks 


The  need  for  smooth  score  functions  is  most  clear  when  the  data  consist  of 
counts.  In  this  case  every  deviation  from  the  model  involves  point  masses. 

An  imoortant  consequence  of  Theorem  1  is  that  Hampel's  optimal  estimator 
(2.4)  is  indeed  optimal  as  claimed  when  the  model  distribution  is  discrete. 

It  would  be  disturbing  if  the  theory  were  to  break  down  at  a  countable  number 
of  parameter  values.  Moreover,  the  smooth  versions  discussed  in  Section  5, 
which  provide  more  stable  inference,  are  justified  for  every  parameter  value 
as  being  nearly  optimal. 

Although  the  discussion  has  focused  on  the  score  functions  arising  from 
Hampel's  optimality  theory,  it  is  not  limited  to  that  context.  For  instance, 
a  score  based  on  Hampel's  three  part  redescending  'p  (see  Huber  (1981,  p.  102)) 
will  be  prone  to  the  same  difficulties,  and  a  smooth  version  will  be  more  stable 

Appendix.  Proof  of  Theorem  2. 

Since  the  d.f.  H  is  continuous,  uniform  convergence  in  (3.7)  will  follow 
from  pointwise  convergence  via  Polya's  Theorem  (Serfling  (1980,  p.  18)). 

Write  M(t)  for  M(t;G)  and  m(t)  for  m(t;G).  denote  by  U(o)  the  set 
(t:  0<|t-9|<5).  By  (Bl),  m  is  defined  on  11(5)  if  5  is  sufficiently  small. 
Moreover,  given  c>0,  there  is  a  5  for  which  tell(5)  implies 

|m(t)  -  m(9-) 1  <  e  if  t  <  9 
and 

]m(t)-m(9+)|<e  if  t  >  9 . 

Choosing  c <  min{ |m(9-) | ,  |m(9+)|}  then  guarantees  that  |m(t)|  is  bounded 
away  from  zero  on  U(5).  Fix  such  a  5. 
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Since  M ( -3 )  =  0,  t  e  U ( 5 )  irriDlies 
(A. 7)  M(t)  =  m(r) (t  -  -j) 

for  some  x  strictly  between  t  and  3,  by  the  Mean  Value  Theorem  (which  only 
requires  one-sided  derivatives  at  the  endooints  of  the  interval  on  which  it 
is  apDlied).  Since  m  is  bounded  away  from  zero  on  U(5),  (A.l)  shows 

it- aj  =  o  ( j  m  ( t )  | ) 

as  t  -►  9 .  The  right  hand  side  of  (A.l)  equals 

(A. 2)  D(t) (t  -  9)  +  R(t) , 

where 

D(t)  =  m(9+)I(t>9)  +  m(9-)I(t<9), 

R(t)  =  [{m(x)  -  m(9+)}I(t>9)  +  {m(i)  -  m(9-)}I(t  <  9)](t  -  9) , 

and  1(A)  is  the  indicator  for  the  set  A.  Note  that  (A. 2)  also  holds  if  t  =  9. 
Since  R(t)  =  o ( 1 1  -  9 | )  =  o([M(t)|),  (A.l)  and  (A. 2)  yield 

(A. 3)  D(Tn)nV2(Tn -9)  =  n‘/2M(Tn)  +  0(  | n^MfT^  | ) . 

Because  of  (B2),  (B3)  and  the  Lindeberg-Levy  central  limit  theorem,  the  right 

2 

hand  side  of  (A. 3)  converges  in  distribution  to  a  N(0,a  )  random  variable, 
and,  hence,  so  does  the  left  hand  side. 

To  obtain  the  limiting  distribution  of  T  ,  partition  its  range  and  consider 
cases.  If  z  <  0  then 

pr{n  ^2 (T  -  9)  <  z ,  Tn>9}  =  0» 


Since  Q(Tn)  =  m(4-)  when  Tn<9,  and  D(t)  does  not  change  sign  on  (6 -  5,- +  5) 
by  (Bl),  (A. 3)  implies  that  this  last  probability  converges  to  i>(  ]  m(  9  - ) jz/o) 
as  n  ■*  ».  Similar  arguments  establish  that,  for  z>0, 

pr{n  ^  (T  -  9)  <  z  ,  <  3}  =  pr{  |m(9-)|n^2(T-9)<0}  ^ 

and 

pr{n  ^  (T^  -  9)  <  Z,  Tn  >  9} 

=  or{0  <  |m(9+)  |n /'2(Tn  -  9)  <  z  |m(9+)  | }  ?( |m(3+)  |z/a)  -  j 

and  finally 

pr{n  /2(Tn  -  0)  <  0}  =  1  -  pr{|m(9+)  |n  ^2(Tn  -  9)  >  0}  -+  j 
as  n-*>.  The  result  follows  by  collecting  terms. 


REFERENCES 


Boos,  D.D.  and  Serfling,  R.J.  (1930).  A  note  on  differentials  and  the  CLT 
and  LIL  for  statistical  functions,  with  apolications  to  M-estimates . 

Ann.  Stat.  3,  613-624. 

Carroll,  R.J.  (1973a).  On  almost  sure  expansions  for  M-estimates.  Ann. 

Stat.  6,  314-313. 

Carroll,  R.J.  (1978b).  On  the  asymptotic  distribution  of  multivariate  M- 
estimates.  J.  Multi .  Anal .  3,  361-371. 

Franklin,  P.  (1940).  A  Treatise  on  Advanced  Calculus.  Wiley,  Mew  York. 

Hampel,  R.  (1963).  Contributions  to  the  theory  of  robust  estimation.  Ph.D. 
thesis.  University  of  California,  Berkeley.  . 

Hampel,  F.  (1971).  A  general  qualitative  definition  of  robustness.  Ann . 

Math.  Statist.  42,  1837-1896. 

Hampel,  F.  (1974).  The  influence  curve  and  its  role  in  robust  estimation. 

J.  Amer.  Statist.  Assoc.  62,  1179-1186. 

Holland,  P.W.  and  Welsch,  R.E.  (1977).  Robust  regression  using  iterativity 
reweighted  least-squares.  Commun.  in  Statist.  A6,  813-827. 

Huber,  P.J.  (1964).  Robust  estimation  of  a  location  parameter.  Ann.  Math. 
Statist.  35,  73-101. 

Huber,  P.J.  (1967).  The  behavior  of  maximum  likelihood  estimates  under 
nonstandard  conditions.  In  Proceedings  of  the  Fifth  Berkeley  Symposium 
on  Mathematical  Statistics  and  Probability,  Vol .  1.  University  of  Cali¬ 
fornia  Press,  Berkeley. 

Huber,  P.J.  (1981).  Robust  Statistics.  Wiley,  New  York. 

Krasker,  W.S.  (1980).  Estimation  in  linear  regression  models  with  disparate 
data  points.  Econometrica  48,  1333-1346. 

Krasker,  W.S.  and  Welsch,  R.E.  (1982).  Efficient  bounded-influence  regression 
estimation.  J.  Amer.  Statist.  Assoc.  77,  595-604. 

Rupoert,  0.  (1985).  On  the  bounded  influence  regression  estimator  of  Krasker 

and  Welsch.  J.  Amer.  Statist.  Assoc.  80,  205-203. 

Serfling,  R.J.  (1930).  Approximation  Theorems  of  Mathematical  Statistics. 
Wiley,  New  York. 

Simpson,  D.G.  (1985).  Some  contributions  to  robust  inference  for  discrete 
probabi 1 i ty  model s .  Ph.D.  dissertation,  University  of  North  Carolina, 
Chapel  Hill. 


Stefanski ,  L.A.,  Carroll,  R.J.,  ind  Runoert,  3 .  1 ? R 5 ' .  ?otircally  bounded 

score  functions  for  Generalized  linear  me dels  i  t  h  annl  ications  to  loaistic 
regression.  (Tentatively  accented  bv  Biometri ka . 


Stigler,  S.M.  (1973).  The  asymoteT ic 
Ann.  Statist.  1,  -172-477. 


•  Str’bution  i"'  the  trimmed  mean. 


